A New Sentence Compression Dataset and Its Use in an Abstractive Generate-and-Rank Sentence Compressor
نویسندگان
چکیده
Sentence compression has attracted much interest in recent years, but most sentence compressors are extractive, i.e., they only delete words. There is a lack of appropriate datasets to train and evaluate abstractive sentence compressors, i.e., methods that apart from deleting words can also rephrase expressions. We present a new dataset that contains candidate extractive and abstractive compressions of source sentences. The candidate compressions are annotated with human judgements for grammaticality and meaning preservation. We discuss how the dataset was created, and how it can be used in generate-and-rank abstractive sentence compressors. We also report experimental results with a novel abstractive sentence compressor that uses the dataset.
منابع مشابه
From Extractive to Abstractive Meeting Summaries: Can It Be Done by Sentence Compression?
Most previous studies on meeting summarization have focused on extractive summarization. In this paper, we investigate if we can apply sentence compression to extractive summaries to generate abstractive summaries. We use different compression algorithms, including integer linear programming with an additional step of filler phrase detection, a noisychannel approach using Markovization formulat...
متن کاملSupervised Sentence Fusion with Single-Stage Inference
Sentence fusion—the merging of sentences containing similar information— has been shown to be useful in an abstractive summarization context. We present a new dataset of sentence fusion instances obtained from evaluation datasets in summarization shared tasks and use this dataset to explore supervised approaches to sentence fusion. Our proposed inference approach recovers the highest scoring ou...
متن کاملGenerate Compressed Sentences with Stanford Typed Dependencies towards Abstractive Summarization
In this paper, we implement sentence generation process towards generate abstractive summarization which is proposed by (Genest and Lapalme, 2010). We simply use Stanford Typed Dependencies1 to extract information items and generate multiple compressed sentences via Natural Language Generation engine. Then we follow LexRank based sentence ranking combined with greedy sentence selection to build...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملA Dataset and Evaluation Metrics for Abstractive Compression of Sentences and Short Paragraphs
We introduce a manually-created, multireference dataset for abstractive sentence and short paragraph compression. First, we examine the impact of singleand multi-sentence level editing operations on human compression quality as found in this corpus. We observe that substitution and rephrasing operations are more meaning preserving than other operations, and that compressing in context improves ...
متن کامل